arXiv:1505.03853vl [physics.soc-ph] 14 May 2015 


A double-edged sword: Benefits and pitfalls of heterogeneous punishment in evolutionary 

inspection games 


Matjaz Perc 1,2 ’ 3 ’E] and Attila Szolnoki 4 [] 

1 Faculty of Natural Sciences and Mathematics, University of Maribor, Koroska cesta 160, SI-2000 Maribor, Slovenia 
■ Department of Physics, Faculty of Sciences, King Abdulaziz University, Jeddah, Saudi Arabia 
3 CAMTP - Center for Applied Mathematics and Theoretical Physics, 

University of Maribor, Krekova 2, SI-2000 Maribor, Slovenia 
4 Institute of Technical Physics and Materials Science, Research Centre for Natural Sciences, 

Hungarian Academy of Sciences, P.O. Box 49, H-1525 Budapest. Hungary 

As a simple model for criminal behavior, the traditional two-strategy inspection game yields counterintuitive 
results that fail to describe empirical data. The latter shows that crime is often recurrent, and that crime rates do 
not respond linearly to mitigation attempts. A more apt model entails ordinary people who neither commit nor 
sanction crime as the third strategy besides the criminals and punishers. Since ordinary people free-ride on the 
sanctioning efforts of punishers, they may introduce cyclic dominance that enables the coexistence of all three 
competing strategies. In this setup ordinary individuals become the biggest impediment to crime abatement. We 
therefore also consider heterogeneous punisher strategies, which seek to reduce their investment into fighting 
crime in order to attain a more competitive payoff. We show that this diversity of punishment leads to an 
explosion of complexity in the system, where the benefits and pitfalls of criminal behavior are revealed in 
the most unexpected ways. Due to the raise and fall of different alliances no less than six consecutive phase 
transitions occur in dependence on solely the temptation to succumb to criminal behavior, leading the population 
from ordinary people-dominated across punisher-dominated to crime-dominated phases, yet always failing to 
abolish crime completely. 


In 1982 Wilson and Kelling 0 ]] introduced the “broken 
windows theory”, explaining how seemingly unimportant and 
harmless signals of urban disorder may over time elicit anti¬ 
social behavior and serious crime. The central premise of the 
theory is simple yet powerful, and it is reminiscent of prefer¬ 
ential attachment or the Matthew effect Hi with a negative 
connotation. Just like the more connected nodes attract more 
new links during network growth [H, so does an unattended 
broken window invite bypassers to behave mischievously or 
even disorderly. Similarly, a graffiti might point to an un¬ 
kept environment, signaling that more egregious damage will 
likely be tolerated as well. One broken window is thus likely 
to become many broken windows, and the inception of urban 
decay and criminal behavior is in place. 

The simplicity of this widely adopted criminological the¬ 
ory invites mathematicians and physicists to adopt a complex 
systems approach |1 to study criminal behavior 0], in par¬ 
ticular since the collective behavior of the system in this case 
can hardly be inferred from the relatively simple individual 
actions. Emergent phenomena such as pattern formation in¬ 
cluding percolation |[El9] and phase transitions are commonly 
associated with complex social and biological systems lUOl - 
0 . and in this realm the mitigation of crime is certainly no 
exception. Recent research highlights that crime is far from 
being uniformly distributed across space and time JH0, 
and this is confirmed also by the dynamic nucleation and dis¬ 
sipation of crime hotspo ts [|16 h 19I1 and the emergence of com¬ 
plex criminal networks 120142311 . 

The emergence of crime can also be treated as a social 


’Electronic address: matjaz.perc@uni-mb.si 
I Electronic address: szolnoki@mfa.kfld.hu 


dilemma in as far that social order is the common 

good that is threatened by criminal activity, with competi¬ 
tion arising between criminals and those trying to prevent 
crime. An adversarial evolutionary game with four compet¬ 
ing strategies has recently been proposed |27|, where paladins 
are model citizens that do not commit crimes and collabo¬ 
rate with authorities, while villains, at the other extreme of 
the spectrum, commit crimes and do not report them. Inter¬ 
mediate figures are informants who report on other offend¬ 
ers while still committing crimes, and apathetics who neither 
commit crimes nor report to authorities. Apathetics are similar 
to second-order free-riders in the context of the public goods 
game with punishment (2!§-[3j]], in that they cooperate at first 
order by not committing crimes, but defect at second order by 
not punishing offenders. Simulations have revealed that in the 
realm of the adversarial game informants are key to the emer¬ 
gence of a crime-free society, and this has_subsequently been 
confirmed also with human experiments 0. 

In general, the mitigation of crime can be framed as an evo¬ 
lutionary game with punishment, although recent research has 
raised doubts on the use of sanctions as a means to promote 
prosocial behavior I33l437ll . Rewards for not doing and report¬ 
ing crime are a viable alternative, and in this case the “stick 
versus carrot” dilemma becomes an important consideration 
li38M4lll . In the context of rehabilitating criminals, the ques¬ 
tion is also how much punishment for the crime and how much 
reward for eschewing wrongdoing in the future is in order for 
optimal results, as well as whether these efforts should be the 
responsibility of individuals or institutions ll42l - l44ll under the 
assumption of a limited budget |45]. 

It is at this intersection of statistical physics of complex sys¬ 
tem and evolutionary games that we aim to contribute in the 
present paper by considering a three-strategy spatial inspec¬ 
tion game with uniform punishment as well as a five-strategy 
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spatial inspection game with heterogeneous punishment. The 
inspection game is a recognized model in the sociological lit¬ 
erature for the dynamics of crime [46| Elfl- The game ad¬ 
dresses the question of why anybody would be willing to in¬ 
vest into costly punishment of criminals, given that individu¬ 
als are tempted to benefit from the punishing activities of oth¬ 
ers without actively contributing to them. As soon as ordinary 
people are introduced who neither commit crimes nor con¬ 
tribute to their mitigation, one is thus faced with the second- 
order free-rider problem 130. 480. As we will show in what 
follows, this may introduce cyclic dominance that enables 
the coexistence of all three competing strategies in the uni¬ 
form punishment model. More importantly, the consideration 
of heterogeneous punisher strategies drastically elevates the 
complexity of possible solutions, revealing on the one hand 
a more effective solution to the second-order free-rider prob¬ 
lem, yet still failing to abolish crime completely. As a con¬ 
sequence, the diversity of punishment allows the formation of 
different alliances between competing strategies, which gives 
rise to a sophisticated range of solutions in dependence on the 
payoffs. 

In the next Section we first present the details of the con¬ 
sidered 3-strategy and 5-strategy spatial inspection game, and 
then demonstrate how systematic Monte Carlo simulations re¬ 
veal the benefits and pitfalls of punishing criminal behavior. 
Simulation details are described in the Methods Section. We 
conclude by discussing the presented results and their wider 
implications. 


Results 

3-strategy and 5-strategy spatial inspection game 

We first introduce a three-strategy version of the spatial 
inspection game, where in addition to criminals C and 
punishers P, also ordinary people O compete for space on 
a L x L square lattice with periodic boundary conditions. 
We use the latter as the simplest network to account for the 
fact that the interaction range among individuals in human 
societies is limited. The payoff matrix 
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contains a as the punishment cost, /3 as the temptation to suc¬ 
cumb to criminal behavior as well as the loss when being a 
victim of crime, and 7 as the reward for punishing criminals. 
Moreover, when a criminal faces a punisher, it will receive 
j3 — 1, where —1 corresponds to the normalized punishment 
fine. These payoffs apply for each pairwise interaction be¬ 
tween the players. 

To enable a more sophisticated response to the second- 
order free-rider problem, we also consider an extended 
model with heterogeneous punishment. Similarly to other 
diversity-motivated social problems Hgj-HH], we expect 


that such a model will provide further insights and a more 
adequate answer to the free-rider problem. In the proposed 
five-strategy version of the spatial inspection game punishers 
are divided into three categories, namely L , M and //, 
depending on the cost they are willing to bear for punishing 
criminals. The extended payoff matrix 



O 

c 

L 

M 

H 

0 

0 

-p 

0 

0 

0 

c 

P 

0 

P~\ 

P-l 

P~ 1 

L 


5(7-a) 

-\a 

-§a 

-\a 

M 

-fa 

§(7-a) 

-fa 

— §a 

-§a 

H 

—a 

7 — a 

—a 

—a 

—a 


contains the same three main parameters as the three-strategy 
payoff matrix, with the key difference being that punishers L 
and M are willing to bear only 1/3 and 2/3 of the full pun¬ 
ishment cost a, respectively. Naturally, they also receive a 
proportionally smaller reward 7. Punishers H correspond to 
punishers P in the three-strategy model in terms of their com¬ 
mitment to sanctioning criminals, but we introduce a different 
notation for convenience. 

Both the uniform three-strategy and the heterogeneous five- 
strategy spatial inspection game are studied by means of 
Monte Carlo simulations, as described in the Methods section. 


Evolutionary dynamics 

We begin by presenting the complete /3 — 7 phase diagram 
at a representative value of the punishment cost a in Fig. □ It 
can be observed that criminals dominate if the reward for their 
punishment 7 is small. If the reward exceeds a certain value at 
a fixed temptation/loss /3, then the punishers become viable. 
At moderate 8 values, however, their presence is also accom¬ 
panied by the emergence of ordinary players. The stability 
of the O + C + P phase is due to cyclic dominance between 
the three competing strategies [13]. In particular, within the 
O + C + P region ordinary people outperform the punishers, 
the punishers defeat the criminals, while the criminals beat or¬ 
dinary people, thus closing the O —» P —> C —>• O loop of 
dominance. Conversely, for larger values of /3, in particular 
if (3 > a, the pure C phase becomes the two-strategy C + P 
phase via a second-order continuous phase transition as 7 in¬ 
creases. Moreover, at sufficiently large values of the reward 
7, the three-strategy O + C + P phase and the two-strategy 
C+P phase are separated by a second-order continuous phase 
transition. 

For a more quantitative view, we present in Fig. U charac¬ 
teristic cross-sections of the phase diagram shown in Fig. Q] 
These cross-sections confirm that criminals can dominate in 
the high temptation/loss region or in the low reward region. 
Moreover, it can be observed that larger rewards are benefi¬ 
cial for the punishers, but only up to a certain point. If 7 in¬ 
creases beyond a critical point ordinary people emerge, and as 
second-order free-riders they flourish on the expense of those 
that punish criminal behavior. We emphasize that, interest- 
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FIG. 1: Phase diagram of the three-strategy spatial inspection game 
with uniform punishment. Depicted are strategies remaining on the 
square lattice after sufficiently long relaxation times as a function of 
the temptation/loss j3 and the reward for punishing criminals 7 , as 
obtained for the the punishment cost a = 0.5. Here C marks the 
parameter region where the population terminates in a homogeneous 
“all-criminal” phase, C + P marks the region where criminals and 
punishers coexist, while in the O + C 4 - P region all three strategies 
are present in the stationary state due to cyclic dominance. Solid blue 
lines denote continuous phase transitions, while the dashed red line 
denotes the border of cyclic dominance between competing strate¬ 
gies. 

ingly, the payoffs of ordinary people are independent of 7 , yet 
still their fraction increases as 7 increases. This counterintu¬ 
itive result is due to cyclic dominance, where feeding the prey, 
in this case the punishers who do get larger payoffs for larger 7 
values, directly benefits the predator, which in this case are the 
ordinary people j52j, [53l. We can thus conclude that the real 
obstacle in the fight against criminal behavior is the possibil¬ 
ity of ordinary people to free-ride on the efforts of punishers. 
A similar conclusion has been reached before for the evolu¬ 
tion of cooperation in the public goods game with punishers, 
where the free-riding problem of defectors is simply deferred 
to the second-order free-riding problem of cooperators [28]. 

As a natural response of punishers to the harmful exploita¬ 
tion of ordinary people, we next consider the five-strategy spa¬ 
tial inspection game with heterogeneous punishment. In par¬ 
ticular, strategies L and M try to eschew the exploitation by 
reducing the amount they contribute for sanctioning to 1/3 
and 2/3 of the full cost, respectively. However, their reward is 
proportionally smaller as well (see the extended payoff matrix 
in Section 2 for details). Due to the large number of competing 
strategies and the resulting multitude of possible subsystem 
solutions we focus on the most important parameter region 
where ordinary players survive in the uniform, three-strategy, 
model. Accordingly, we explore a representative cross sec¬ 
tion when the reward is high enough for punishing strategies 
to survive, and we explore how the system responds to the 
diversity of punishment. 

Results presented in the left panel of Fig. [3] confirm the ef¬ 
fectiveness of resorting to heterogeneous punishment in that 
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FIG. 2: Two characteristic cross-sections of the phase diagram de¬ 
picted in Fig. |T] Left panel shows the fraction of the three strategies 
in dependence on the temptation/loss /3 at 7 = 0.8. Starting at the 
three-strategy O + C + P phase, the fraction of ordinary people and 
the criminals decreases steadily with increasing the value of f3 until 
eventually O die out and the two-strategy C + P phase is reached. 
Immediately thereafter the fraction of criminals starts rising as the 
value of P increases further, with the second continuous phase tran¬ 
sition marking the emergence of the pure C phase. Right panel shows 
the fraction of the three strategies in dependence on the reward for 
punishing criminals 7 at /3 = 0.8. In this case we start at the pure C 
phase, which turns to the two-strategy C + P phase as soon as 7 is 
large enough to sustain the punishers. As 7 increases further ordinary 
people become viable too through a second continuous phase transi¬ 
tion, ultimately yielding the three-strategy O + C + P phase that is 
maintained by cyclic dominance. In both panels the punishment cost 
is a = 0.5. 


second-order free-riders are able to survive only in a signifi¬ 
cantly narrower interval of the temptation/loss (3 if compared 
to the uniform punishment model. Furthermore, results pre¬ 
sented in the right panel of Fig. |3] also give credence to the 
expectation that the reduced viability of ordinary people will 
promote the evolution of punishers. More precisely, we find 
that the uniform punishment strategy is significantly less ef¬ 
fective than heterogeneous punishment for almost the entire 
range of the temptation/loos /3, except for a narrow interval in 
the P > a region. As we will show in Fig. [4] this fact has im- 
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FIG. 3: Left panel shows the fraction of ordinary people in depen¬ 
dence on the temptation/loss /3, as obtained for the three-strategy spa¬ 
tial inspection game with uniform punishment and the five-strategy 
spatial inspection game with heterogeneous punishment (see legend). 
It can be observed that heterogeneous punishment is indeed more ef¬ 
fective in eliminating second-order free-riding by ordinary people 
than uniform punishment. Right panel shows the fraction of punish¬ 
ers in dependence on the temptation/loss [3 for the uniform punish¬ 
ment model and the aggregate fraction of all punishers in the hetero¬ 
geneous punishment model, as well as the fraction of punishers L, 
M and H individually (see legend). The success of heterogeneous 
punishment to eliminate second-order free-riding is somewhat rela¬ 
tivized, as higher punishment levels will not necessarily lead to lower 
criminal levels (see Fig.[4]for an explanation). The origin of the zig¬ 
zag outlay of the aggregate fraction of all punishers is analyzed in 
Fig.0 In both panels the punishment cost is a = 0.5 and the reward 
for punishing criminals is 7 = 1.5. 


portant consequences for the mitigation of criminal behavior 
in the population. 

Another peculiarity that can be observed in the right panel 
of Fig. [3]is the zig-zag outlay of the aggregate fraction of all 
punishers in the five-strategy model. Yet this can be under¬ 
stood thoroughly simply by looking at the fraction of punish¬ 
ers L, M and H individually. The mentioned panel reveals 
clearly that low values of j3 are able to sustain only those 
punishers who are willing to invest the lowest cost towards 
sanctioning criminals. The rank of the most viable punishers 
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FIG. 4: Top panel shows the fraction of criminals in dependence on 
the temptation/loss /?, as obtained for the three-strategy spatial in¬ 
spection game with uniform punishment and the five-strategy spatial 
inspection game with heterogeneous punishment (see legend). It can 
be observed that heterogeneous punishment is more effective than 
uniform punishment in eliminating crime only in the low /3 limit, 
which also agrees with the region in which second-order free-riding 
is deterred more efficiently (see Fig. [3}. In general, however, uniform 
punishment works just as well or better than heterogeneous punish¬ 
ment in abating crime. Bottom panel again shows the fraction of 
criminals, along with the different phases that contain the C strategy. 
Despite the multitude of consecutive phase transitions in dependence 
on solely a single parameter, criminal behavior is never completely 
eliminated. In both panels the punishment cost is a = 0.5 and the 
reward for punishing criminals is 7 = 1.5. 


subsequently increases from L over M to H as we increase j3, 
and the solution of the five-strategy model thus eventually be¬ 
comes identical to the the solution of the three-strategy model. 
Remarkably, we can observe six consecutive phase transitions 
[(O + C + L) -> (C + L) -> (C + L + M) —»• (C + M) ->■ 
(C + M + H) —> (C + H) —> C] as we increase a single 
parameter, /?. It is worth pointing out that the reported incre¬ 
ment of the punisher rank with increasing the temptation/loss 
/3 resonates with the outcome of a recent human experiment 
|54ll, where, in the realm of a social dilemma, it was shown 
that if cooperation is likely one should punish mildly. 

We continue with the results presented in Fig. [4] where we 
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FIG. 5: Time evolution of strategy distributions in the population, as obtained with the heterogeneous punishment game starting from the same 
prepared initial state (leftmost panel) for 7 = 1.5 and three different values of the temptation/loss: (a)-(d) /? = 0.5, (e)-(h) /3 = 0.9, and (i)-(l) 
/3 = 0.7. The resulting three different stationary states are reached within 400 MCS, which are depicted in the rightmost panels. Colors red, 
light blue and dark blue depict the location of C, L and M players, respectively. For visual clarity, we have used a small 150 x 150 system 
size. See main text for a detailed description of the different evolutionary outcomes. 


compare the effectiveness of uniform and heterogeneous pun¬ 
ishment to deter criminal behavior. To a degree unexpected, it 
can be observed that the possibility to resort to different levels 
of punishment does not necessarily work better than uniform 
punishment in reducing crime. On the contrary, the fraction 
of C players is generally higher over a large interval of /? val¬ 
ues when the heterogeneous punishment model is used. More 
precisely, the fraction of criminals is lower only in the low 
temptation/loss region where L punishers can adjust to this 
favorable condition. This observation is related to the fail¬ 
ure of heterogeneous punishment to eliminate second-order 
free-riding more effectively than uniform punishment, and it 
indicates that sophisticatedly adjusted punishers may win a 
battle against ordinary people, but loose the main war against 
the actual enemy, the criminals. While punishers can lower 
the amount they invest towards sanctioning criminals, such a 
reduced effort also yields smaller rewards. Interestingly, the 
positive side of lower costs can be utilized only if the hetero¬ 
geneity of punishers is maintained. The said effect becomes 
visible if we mark the borders of different phases on the curve 
of criminals, as shown in the right panel of Fig. [4] As it is il¬ 
lustrated, the fraction of criminals can be a decaying function 
even if we increase the temptation/loss f), but only as long 


as different types of punishers exist and compete against the 
criminals. As soon as evolution favors a single punisher type, 
an effective response to an increase of the value of /3 becomes 
absent. Lastly, we note that the conclusions attained with the 
results presented in Figs. [3] and [4] remain generally valid also 
for all high temptation values. 

To obtain a better understanding of the origin of the zig-zag 
outlay of criminals depicted in Fig. [4j we monitor the time 
evolution of the distribution of strategies in the population for 
three different combinations of payoff parameters, as shown 
in Fig. 0 We emphasize that the main mechanism responsi¬ 
ble for the formation of different stationary states is due to the 
different motion of interfaces that separate the possible solu¬ 
tions of the system. Accordingly, we follow the evolution of 
interfaces starting from a prepared initial state, but for clarity 
only two types of punishers are present because this minimal 
model is sufficient to capture the essence of the emerging ef¬ 
fect. The extrapolation to the full five-strategy model, how¬ 
ever, is straightforward. For comparison, we use an identi¬ 
cal prepared initial state, as shown in the leftmost panel, for 
three representative values of /?. As in previous figures, red 
color depicts C players while light and dark blue depict the 
L and M punishers, respectively. Before discussing each spe- 
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cific case, we note that, individually, L always beats M due 
to the lower cost of inspection. When the temptation/loss is 
low, as shown in panels (a)-(d), M can beat C very efficiently, 
while L is unable to do the same but simply coexists with the 
criminals. The superiority of L over M, however, will result 
in a shrinking area of the M domain, as shown in panel (b). 
Ultimately, this fact leads to the extinction of strategy M, de¬ 
spite the fact that it is more successful in deterring criminals 
than strategy L. As soon as M die out, as shown in panel 
(c), criminals can exploit the milder punishment from strategy 
L and spread towards the stationary state, as shown in panel 
fd). A seemingly surprising and counterintuitive result is that 
criminals, who can coexist with L players but are defeated by 
M players, are able to survive while their “predators” (M) 
go extinct. But in fact, the evolution depicted in the panels 
(a)-(d) simply illustrates the actual consequence of second- 
order free-riding. Namely, L players exploit the more altru¬ 
istic M players by contributing less to sanctioning criminals. 
In the absence of L players, however, the common enemy (C) 
can spread relatively free and reach a significantly high level 
(.fc « 0.46). 

Interestingly, when M players are less successful in deter¬ 
ring C players, the outcome is completely the opposite, as 
shown in panels (e)-(h) of Fig. [5] Since the temptation/loss 
/3 = 0.9, C are able to coexist with M. The coexistence of 
C and L strategies is also still possible, and at the same time 
L continue to invade the pure M phase [the invasion ends in 
panel (f)]. However, L become ineffective against the C + M 
alliance. Indeed, this two-strategy alliance is so powerful that 
it beats the other C + L alliance completely. The competition 
between the two alliances starts in panel (g), and it terminates 
with the total victory of the C + M alliance in panel (h). The 
conclusion is similar as in the preceding case. Namely, when 
the evolution selects only one type of punishers, then crimi¬ 
nals have a reasonable chance to survive. Note that the frac¬ 
tion of criminals in the stationary state is again relatively high, 
fc ~ 0.40, despite of substantial punishment. 

The most favorable outcome can be obtained at an inter¬ 
mediate temptation/loss value, as shown in panels (i)-(l) of 
Fig. [5] The /? = 0.7 value is still high enough to maintain the 
coexistence of the C + M alliance, but it lessens its evolu¬ 
tionary advantage in that the C + L alliance is able to survive. 
The stationary state thus contains three strategies, whereby 
a relatively small portion of the population, fc ~ 0.27, is 
occupied by criminals. We thus conclude that, in the long- 
run, if different punisher strategies survive in the stationary 
state, heterogeneous punishment may be utilized successfully 
to mitigate crime better than uniform punishment. Note that 
fc is a decreasing function of /3 in the three-strategy phase in 
Fig .[4] while it always increasing when homogeneous punish¬ 
ment is applied (in C + L, C + M, or in the C + H phases). 
This is because heterogeneous punishment enables the valida¬ 
tion of the most effective approach against crime: sometimes 
moderate efforts, yielding milder fines, serve the interest of 
whole population better than severe punishment. Even more 
importantly, the simultaneous presence of different types of 
punishers enables a synergy among them in that one strategy 
(in our case M) can lower the payoff of criminals significantly 


while the other strategy ( L ) can still enjoy a more competitive 
payoff due to a smaller cost. This multi-point effect is con¬ 
ceptually similar to when the duty of punishment is shared 
stochastically among cooperative players |[45tl . Of course, as 
we have already emphasized, these conclusions remain valid 
and can be extrapolated to a larger number of different pun¬ 
isher strategies. 


Discussion 

We have studied the effectiveness of punishment in abating 
criminal behavior in the spatial inspection game with three 
and five competing strategies, entailing criminals, ordinary 
people and punishers. In the five-strategy game, we have 
introduced three different types of punishers, depending on 
the amount they are willing to contribute towards sanction¬ 
ing criminals. We have shown that cyclic dominance plays an 
important role in that it maintains the survivability of seem¬ 
ingly subordinate strategies through indirect support. For ex¬ 
ample, increasing the reward for punishing criminals might 
promote second-order free-riding of ordinary people, despite 
of the fact that it should in fact support the punishers. This is 
due to cyclic dominance, where directly promoting the prey, 
in this case the punishers, benefits the predator, which in this 
case are the ordinary people. Moreover, we have shown that 
the actual obstacle in the fight against criminal behavior is 
the possibility of ordinary people to free-ride on the efforts of 
punishers, which is also the main culprit behind the establish¬ 
ment of cyclic dominance. In general, sanctioning criminal 
behavior is thus a double-edged sword. The obvious benefit is 
that the evolution of crime is contained and is unable to dom¬ 
inate in the population. The pitfall is that, in conjunction with 
ordinary people, punishment creates conditions that support 
cyclic dominance, which prevents the complete abolishment 
of crime even if the sanctions are severe and effective. 

In addition to these observations, we have shown that the 
possibility of heterogeneous punishment yields a highly am¬ 
biguous measure against criminal behavior. At specific pa¬ 
rameter values it can happen that milder punishers play the 
role of second-order free riders, which ultimately prevents to 
eliminate crime completely [see panels (a)-(d) in Fig. O- Evi¬ 
dently, the reverse process is also possible in structured popu¬ 
lations where the more altruistic punishers can separate from 
second-order free riders and win the indirect territorial battle 
HH Sill- But in the realm of the studied inspection game, we 
have also observed that the diversity of punishers can yield 
a more favorable social outcome even as the temptation to do 
crime is growing. In the latter case, the simultaneous presence 
of different punishers provides an advantageous coexistence: 
some punishers ensure a higher fine to criminal players while 
other punishers can benefit from a lower cost due to a less 
intensive engagement. Importantly, neither of these two op¬ 
tions is effective on its own right, but together they improve 
the effectiveness of combating crime. 

Notably, the emergence of cyclic dominance due to strate¬ 
gic complexity has been reported before, for example in pub¬ 
lic goods games with volunteering |56|. peer punishment 
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ll3ll EmU], pool punishment ll43l l44ll and reward lf39L 6C], 
but also in pairwise social dilemmas with coevolution I6lll62!l . 
Other counterintuitive phenomena that are due to cyclic domi¬ 
nance JUS include the survival of the weakest 11521 . [&5l . the 
emergence of labyrinthine clustering [66], and the segregation 
along interfaces that have internal structure [63], to name but 
a few examples. Cyclical interactions are thus in many ways 
the culmination of evolutionary complexity | ljjj], and we here 
show that they likely play a prominent role in deterring crime 
as well. However, while the beneficial role of cyclic domi¬ 
nance for maintaining biodiversity is undeniable, one has to 
concur that it is a rather unsatisfactory outcome in terms of 
fighting criminal behavior. That is the sort of diversity in be¬ 
havior that human societies could happily do without, yet it 
seems that this is precisely the trap the current system has 
fallen into. Indeed, data from the Federal Bureau of Investiga¬ 
tion (see Fig. 2 in Ref. J3]) indicate that crime, regardless of 
type and severity, is remarkably recurrent. Although positive 
and negative trends may be inferred, crime events between 
1960 and 2010 fluctuate across time and space, and there is no 
evidence to support that crime rates are permanently decreas¬ 
ing. The search for more effective crime mitigation strategies 
is thus in order, in particularly for such where the permanent 
elimination of crime is not an a priori impossibility. 


Methods 

For both the 3-strategy and the 5-strategy spatial inspec¬ 
tion game the Monte Carlo simulation procedure is the same. 
Initially all competing strategies are distributed uniformly at 
random on the square lattice. We note, however, that the re¬ 
ported final stationary states are largely independent of the 
initial fractions of strategies. Subsequently, in agreement with 
the random sequential update protocol, a randomly selected 
player x acquires its payoff IT,. by playing the game pairwise 
with all its four neighbors. Next, player x randomly chooses 
one neighbor y, who then also acquires its payoff I \ y in the 


same way as previously player x. Once both players acquire 
their payoffs, player x adopts the strategy s y from player y 
with a probability determined by the Fermi function 


W (Sy —> S X ) 


1 

1 + exp[(n 2 . - U y )/K] ’ 


(1) 


where K = 0.5 quantifies the uncertainty related to the strat¬ 
egy adoption process iTTol !68j] . In agreement with previous 
works, the selected value ensures that strategies of better- 
performing players are readily adopted by their neighbors, al¬ 
though adopting the strategy of a player that performs worse 
is also possible II69[ 17011 . This accounts for imperfect informa¬ 
tion and errors in the evaluation of the opponent. 

Each full Monte Carlo step (MCS) consists of L 1 2 3 4 elemen¬ 
tary steps as described above, which are repeated consecu¬ 
tively, thus giving a chance to every player to change its strat¬ 
egy once on average. We typically use lattices with 600 x 600 
players, although close to the phase transition points up to 
9000 x 9000 players had to be used in this case to avoid ac¬ 
cidental extinctions, and thus to arrive at results that are valid 
in the large-size limit. The fractions of competing strategies / 
are determined in the stationary state after a sufficiently long 
relaxation time lasting up to 10 5 6 7 8 9 MCS. In general, the station¬ 
ary state is reached when the average of the strategy fractions 
becomes time-independent. Moreover, to account for the dif¬ 
ferences in initial conditions and to further improve accuracy, 
the final results are averaged over up to 100 independent runs 
for each set of parameter values. 
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